This is an informal, exploratory document. Its purpose is to inform and guide the collaboration between Joe Brew and Alberto García-Basteiro in comparing WHO and IHME TB and TB-HIV mortality data.
The below scatterplot shows the correlation between WHO (x-axis) estimates and IHME (y-axis) estimates, with each point colored by its (WHO-defined) region. The (orange) line of best fit is for the entire dataset.
We can fit each region’s line of best fit.
We can also examine relative difference in estimates, by WHO region. The below violin charts show the distribution of relative difference (defined here as the absolute difference in total number of TB deaths, divided by that maximum estimate of the two sources). The jittered points represent each country.
The above metric of relative variance is independent of whether the IHME or WHO estimate is higher. As an alternative, we can instead visualize variance by examining the ratio of IHME deaths over WHO deaths.
A few outliers make this chart difficult to inerpret. Accordingly, we remove those outliers and look only at those countries for whom the difference is less than 5x.
As a complement to the above, we can also examine the ratio of WHO deaths over IHME deaths.
The above charts show us differences by region (with each point being a country). It is also useful to examine country-specific differences.
In the below chart we identify those countries for whom the ratio of IHME-estimated deaths (TB + HIV/TB) is greatest (top 10) and least (bottom 10).
As a complement to the above, we can examine the inverse: WHO deaths over IHME deaths.
Rather than ratios, we can also examine absolute differences. The below bar chart shows those countries with the greatest absolute difference (IHME estimated deaths minus WHO estimated deaths).
As a complement to the above, we can do WHO deaths minus IHME deaths.
All of the above are biased by small numbers (ie, countries with very few cases are most likely to have the most extreme ratios). Accordingly, we can instead examine an indicator suggested by Frank: The number of reported cases, divided by deaths.
This indicator is, in fact, two indicators. One is the reported cases divided by deaths (per the WHO), and the other is per the IHME. The below histograms show the country-specific distribution of these indicators.
When we examine the cases over deaths metrics at the country specific level as well. As per before, we’ll take the top and bottom 10 countries only.
The below chart shows the estimated percentage of new TB cases with rifampicin resistant TB (x-axis) and the cases per WHO deaths (y-axis):
The below chart is identical, but using IHME deaths rather than WHO.
We can also examine the same x-axis (resistance), but relative to other indicators.
In the below chart, numbers above 0 mean that the WHO estimates n percentage higher than the IHME. Numbers below 0 mean that the IHME estimates abs(n) percentage higher than the WHO.
In the above chart, it may seem that there are no negative values. This is not the case. It’s simply that when the WHO and IHME differ in the extremes, it pulls in one direction:
The below map is similar to the above, but rather than showing the “relative” difference in WHO / IHME estimates, it shows absolute difference (ie WHO minus IHME instead of WHO divided by IHME).
The above map may appear overwhelmingly green (ie, that most of the countries have a near 0 difference). This is due to a difference in scale (ie, very few extreme observations). We can get a more granular look at things by taking the quadratic root of our indicator (below).